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ABSTRACT 

This paper considers the feasibility of incorporating 
research results from cognitive science into the modeling of 
performance on psychometric tests and the construction of test items. 
The paper focuses on the feasibility of modeling performance on a 
three-dimensional rotation task within the context of Item Response 
Theory (IRT). To test the feasibility of psychometrically modeliru 
performance on this item type, an 80-item, three-dimensional rotation 
test was constructed using eight basic Shepard-Metzler figures. An 
inexpensive computer system was also developed to administer the test 
and record performance, including response-time data. Data were 
collected on high school juniors and seniors. As expected, angular 
disparity was a potent determinant of item difficulty. The 
applicability of IRT to these data was investigated by dichotomizing 
response time at several points and applying standard item parameter 
est nation procedures. It was concluded that an approach to 
psychometric modeling that explicitly incorporates information on the 
mental models that examiners use in solving an item is not only 
workable, but also essential for future developments in 
psychometrics . (Author/LPG) 
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Abstract 

This paper considers the feasibility of incorporating research 
results from cognitive science into the modeling of performance on 
psychometric tests and the construction of test; items. The paper 
focuses on the feasibility of modeling performance on a three- 
dimensional rotation task within the context of Item Response Theory 
(IRT), Three-dimensional items were chosen because there is a rich 
literature on the mental models that are used in their solution. To 
test the feasibility of psychometrically modelinq performance on this 
item type an 80-item, three-dimensional rotation test was constructed. 
An inexpensive computer system was also developed to administer the 
test and record performance, including response-time data. Data were 
collected on high school juniors and seniors. As expected, angular 
disparity was a potent determinant of item difficulty. The applic- 
ability of IRT to these data was investigated by dichotomizing 
response time at several points and applying standard item parameter 
estimation procedures. It was concluded that an approach to psycho- 
metric modeling that explicitly incorporates information on the mental 
models examiners use in solving an item is not only workable, but also 
essential for future developments in psychome tries. 
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A Psychometric Analysis of a Three-dimensional Spatial Task 

Isaac I, Bejar 

Introduction 

The bulk of psychometric theorizing and testing practice has been 
unconcerned with very detailed descriptions of the mental processes that 
underlie performance on test items. Instead, the focus has been on broader 
constructs, such as aptitude and abilities. As a result, the item, the 
building block for constructing a test, does not play as large a role as 
perhaps it should in either test construction or psychometric modeling, 
Thurstone, I think, deserves some of the credit, or rather some of the 
blame, for this situation (See S tenner, Smith, & Burdich, 1983), He was 
opposed to radical behaviorism and in reacting to it urged psychologists 
not to let ths stimulus be the driving force of psychology. Although this 
reaction was not specifically concerned with testing, it is not hard to 
imagine that he might carry this perspective to the psychometric arena as 
well. 

Whether Thurstone is responsible or not, it is accurate to say that in 
much of test construction the items are viewed as replicates and of little 
intrinsic interesc. The alternative perspective is that items, far from 
being easily replaceable entities, are important in their own right and 
that accounting for differences among them with respect to their psycho- 
metric characteristics will aid our understanding of what a test measures 
(e,g,, Bejar, 1985? Embretson, 1983), just as accounting for differences in 
total score variation improves our understanding of what a test measures. 

Whereas the preferred methodology for modeling test score variation 
has been factor analysis, the methodologies of cognitive science (see 
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Miller, Poison, & Kintsch, 1984) appear to be suited to undertake the 
validation of tests from this perspective. This report focuses on mental 
rotation research and has the broad objective of exploring the feasibility 
of incorporating cognitive research with psychometric modeling and item 
construction. This will be investigated by examining data from a well- 
studied cognitive task—the three-dimensional mental rotation task— from a 
psychometric perspective. Specifically, the present research aims to capi- 
talize on the significant amount of research produced in the last fifteen 
years in the area of spatial cognition. Much of this research has focused 
on the type of representation used by subjects to solve spatial problems. 
This progress is significant. Charles Myers, who, at ETS in the '50's, 
conducted much research on spatial ablity for the College Board, said that 

In this report we use the term "spatial ability" 
to represent a complex family of abilities with 
unknown interrelationships, we do not yet know 
of a terminology that permits a more precise and 
efficient language. (Myers, 1958, p. 24) 

By contrast, Cooper and Shepard (1984) recently concluded after reviewing 

their work on mental rotation that 

In spite of some unresolved issues, the close match 
we have found between mental rotation and their 
counterparts in the physical world leads inevitably 
to speculations about the functions and origin of 
human spatial imagination. It may not be premature 
to propose that spatial imagination has evolved as 
a reflection of the physics and geometry of the 
external world. The rules that govern structures 
and motions in the physical world may, over evolu- 
tionary history, have been incorporated into human 
perceptual machinery, giving rise to demonstrable 
correspondences between mental imagery and its 
physical analogues, (p. 114) 

In the intervening period significant research and theorizing from the 

factor-analytic and the experimental perspective have occurred. Much of 
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that work has been reviewed (Corballis, 1982; Lohman, 1979; McGee, 1979). 
What emerges from a distilation of the literature is the presence of 
three mental factors; namely, spatial relations, spatial visualization, and 
spatial orientation. These factors have been investigated by cognitive 
psychologists (e.g., Pellegriro & Kail, 1982), But a specific task under 
the spatial relations factor has received so much attention that Corballis 
(1982) has raised it to the level of paradigm. That task is the three- 
dimensional mental rotation task, A typical stimulus used for this kind of 
research appears in Figure 1, 



Insert Figure 1 About Here 



The most significant finding fiom this line of research has been the 
seemingly universal finding that one feature of these stimuli, namely 
angular disparity, controls the response time (e.g., Cooper, 1980; Shepard 
& Metzler, 1971). By contrast, within psychometric settings it is usually 
difficult to obtain a priori predictions of the psychometric difficulty of 
the item, let alone of its response time. That is, within psychome tries* we 
are often content to estimate difficulty. But, of course, estimating is 
not explaining. 

It is not unreasonable to suggest that we often focus on estimating 
rather than explaining difficulties because of the absence of a valid 
psychological model for solving the item. To the extent that the psycho- 
logical model is concerned with the effort required to solve the item it 
may be feasible to accurately predict an item's psychometric character- 
istics, especially its difficulty. If this can be done with enough 
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precision it is in principle an alternative procedure, or at least an 
additional source of information, for estimating the difficulty of items. 
For example, it may be feasible to obtain valid estimates of difficulty by 
combining information about the psychological demands of the items and a 
small sample of subjects, instead of administering the test to a large 
sample of potential examinees. The implementation of this approach would 
require procedures for estimating the parameters in a psychometric model 
that are capable of incorporating "prior" information into the estimation 
process. 

The foundation for incorporating prior information into the estimation 
of psychometric parameters is being laid [e.g., Bock & Atkin, 1981? 
Swaminathan & Gifford, 1981; Tsutakawa & Lin, 1984), Moreover, commer- 
cially available programs exist capable of handling some forms of prior 
information (Assessment System Corporation, 1984? Mislevy & Bock, 1982), 
Since the production of collateral information on the item would be based 
on an understanding of how examinees solve the item, the possibility also 
exists that at some point it might be possible to build, on the basis of 
that knowledge, systems whereby an item writer could receive feedback on 
the likely psychometric characteristics of a prospective item before it is 
ever adminis. d to an examinee- (See Bejar, Stabler, & Camp 1986? Bejar 
& Yocom, 1986), 

Psychometric Modeling of Spatial Rotation Data 

The application of the foregoing to the psychometric modeling of 
spatial ability suggests as a criterion of success the determination of 
psychometric difficulty in terms of item attributes, or specifically, in 
the present case, linking psychometric difficulty to angular disparity. 
The mildest criterion, perhaps, is that difficulty should increase a? 

u 
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angular disparity increases, A stronger criterion is that difficulty 
should increase in a linear fashion with angular disparity or some trans- 
formation thereof. If such a relationship can be established our inter- 
pretation of the data can be considerably more descriptive. In the absence 
of such a linkage, difficulty in an IRT response model is defined, wnen 
guessing is not present, as point on the ability scale at which there 
is a 50-50 chance of responding correctly, when difficulty is linked to an 
item attribute, such as angular disparity, we can reference performance to 
that attribute. Thus we could speak of ability as the level required to 
achieve a 50-50 chance of success on a task involving a certain degree of 
angular disparity. In short, relating psychometric parameters to a mental 
model of the item solution process is likely to improve the interpretation 
of psychometric results. 

Since the three-dimensional mental rotation item doe c not require 
problem solving, the time to obtain a correct response is directly 
interpretable as the efficiency with which the mental rotation takes place. 
Therefore, in fitting a psychometric model to these data, both accuracy and 
response time should be taken into consideration. Considering both 
responses suggests an expansion of the criterion described above. That is, 
in addition to expecting an increase in difficulty as a function of angular 
disparity, we should expect that the relationship between angular disparity 
and difficulty would remain the same as the time limit to perform the task 
is increased. Figure 2 illustrates the expected relationship. In general, 
however, allowance must be made fo>- the possibility that the intercept is 
not linear in time. That is, in general, the gap between lines may not be 
a constant. 
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Insert Figure 2 About H€ a 

Modeling Response Latency 

A strategy for incorporating response latency into psychometric 
modeling has been proposed recently by Bloxom (1985); therefore, a review 
of the relevant literature will not be attempted here. Here we will focus 
on a discussion of modeling response latency as an extension of models for 
Jichotomous dat . The approach we folio-" is to fit a dichotomous item 
response model to response times to a set of 80 three-dimensional 
rotation items. The objective is to determine whether a more refined 
psychometric model should be attempted; not to provide the definite 
calibration of these data, 

A common model for the probability of dichotomous response when time 
: s not a factor is the two parameter logistic model: 

1 

p(u ± - i|e> + (1] 

1 + e 

where a> is the discrimination, b i is difficulty parameter, and 6 is 
ability. Now consider a situation where the interest is on the probability 
of a correct response after a certain period of time has elapsed. We would 
expect that, at least with certain item types, the longer an item is con- 
sidered, the higher the probability of a correct response. Figure 3 con- 
veys this notion. In effect have an equatior such as Equation [1] after 
increasing amounts of time have elapsed. Figure 3 has a constraint that is 
essential for interpretability, namely that the curves only differ on their 
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inflection point, Thi& means that the discrimination parameter is constant 
across time, but the difficulty parameter varies as a function of time. 
Put differently, the probability of a correct response after increasing 
elapsed times is solely a function of time, Micko (1969) has applied this 
idea by specifying the dichotomous item response model to be the Rasch 
model. 



Insert Figure 3 About Here 



Although it is in principle possible to model response time with a 
dichotomous item response model it is also possible to generalize the 
dichotomous model as Samejima (l° f as done. In this generalization the 
continuous response is converted to a 0-1 interval. For response latency 
this means that the response is expresse as a proportion of the total 
allowed time for resoonding to an item. If the time limit is 15 seconds, 
for example, a response latency of 5 seconds would be .33, Samejima 
refers to the response expressed in this manner as z. There is nothing in 
the model concerning whether the response time is for a correct or an 
incorrect response. Such a distinction must be made for scoring purposes; 
however, this will be discussed in a subsequent report. Basically, the 
idea is to treat responses as cone ct if, in fact, a correct response is 
produced after s seconds, and as n ot-correct-yet if an incorrect response 
is given. That is, incorrect responses are treated as incomplete responses 
indicating that "the last time we looked, namely after s seconds, the 
individual had not produced a correct response." In statistical 
terminology, incorrect responses are treated as censored observations, as 
in survival analysis (e.g,, Miller, 1981). 



With this in mind, the probability that someone of ability 9 takes 
longer than z to respond is given by 



P* (6) = 
i 



V e " b z. ) 

1 Dexp(-Dt) (l+exp(-Dt)] 2 dt 



-Da. (G - b ) 



1 + e 



i'l 



-1 



[2] 



-Da. (6 - b z ) 
1 ♦ e 1 

which is similar to Equation [1] except that the difficulty parameter now 
is a function of z, response time. The difficulty function b is not 
constrained to any particular shape other than that it be monotonia In 
this paper we will investigate the fit of a linear function. That is, we 



are interested in b 's of the form 

z 

b =<(>.+ S.z 

z. T l 1 

i 



[3] 



<J>^ could be further decomposed into components associated with figure 
attributes, but our focus here is on the adequacy of a linear difficulty 
function when angular disparity and time are taken into consideration. 

An interesting implication of this mcisl is its possible compatibility 
with the "slope and intercept" methodology commonly used by cognitive 
researchers interested in individual differences (e,g,, Lansman, Donaldson, 
Hunt, & Yantis (1982), The slope and intercept methodology calls for 
computing the regression of response time on item attributes such as 
angular disparity, for each subject. The slope and intercept of that 
regression are taken to be estimates of individual differences parameters. 
The slope, for example, is taken to be an indicator of speed of processing 
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while the intercept .s interpreted to include a series of "overhead" pro- 
cesses such as encoding. It is not clear from users of this methodology 
what relationship ought to exist between these two parameters. However, in 
practice they are often correlated, in the Lansman et al. study, for 
example, the slope on paper and pencil three-dimensional mental rotation 
tests correlated .43 and .32 with the slope and intercept, respec- 
tively, of a computer version of the same test. That is, the slope and 
intercept seem to be picking up substantially the same individual 
difference variable. 

The model in Equation [2] predicts that the relationship between 
response time and item complexity (as reflected by the difficulty para- 
meter) for a given individual is linear and increasing for a fixed level of 
accuracy; that is, proportion correct. This is consistent with the "slope 
and intercept" methodology. However, the model also predicts that the 
slope of that relationship is constant across all levels of ability while 
the intercept varies witn ability. This is not necessarily consistent with 
the slope and intercept methodology since the slope of the regression of 
response time on angular disparity has been generally interpreted as the 
rate at which the subjects mentally rotate the object. Thus, according to 
the model in Equation [2] the locus of individual difference, when accuracy 
is constant, is not in the rate of rotation, the slope, but rather on the 
intercept which is often associated with the encoding and other "overhead" 
processes of the item. From a substantive point of view the distinction is 
important since it is precisely the interpretation that subjects mentally 
rotate the figures that has generated so much interest in this line of 
research. 
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Design 

To secure data for the study we recruited 160 high school students 
from a local high school. The items used in this study consisted of P0 
pairs of three-dimensional Shepard-Metzler figures. The following eight 
figures were used (Al, A2, A3, CI, C2, C3, El, and E2). 1 For each figure 
true and false pairs were constructed by rotating at angular disparities of 
20, 60, 100, 140, and 180. The true pairs were constructed by rotating the 
same figure along the picture axis. The false pairs were constructed by 
rotating the mirror image instead. Altogether there were 16 items at 
each angular disparity. The resulting 80 items were videotaped and 
placed on a videodisc using the 3M mastering process. 

The items were presented in two different orders. In one, the 
examinees saw items at 100, 60, 180, 140, and 20 degrees. At each angular 
disparity there were 16 possible items, and one of those was chosen at 
random. With the second ordering, subjects worked items of 20, 140, 180, 
60, and 100 degrees. Approximately one-half of the subjects took the item 
in each order. 

The instrumentation for each data collection station consisted of the 
following components: 

* A 64K microcomputer with 1-disc drive (Radio Shack 26-3127) 

* A videodisc player (Pioneer PR 8210) 

* An Amdek Color I Monitor 

* A Joystick (Radio Shack, 26-3012) 

* A computer-to-videodisc interface (especially constructed 
for the project) 

I 

The author is indebted to Professor Roger Shepard of Stanford University 

Tor providing these materials. 
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Thc microcomputer was programmed to control the videodisc as well as record 
the responses. Response time was recorded in "ticks" where a tick is a 
60th of a second. Subjects responded by means of a joystick connected to 
the computer. A "yes" response was signaled by moving the joystick for- 
ward, while a "no" response was indicated by moving the joystick backward. 

Because of the potential unfamiliarity of the equipment, at least as a 
psychological testing device, careful attention was given to the instruc- 
tions. Instructions were tested with several students unrelated to the 
study to insure that they were fully understandable. Students were told 
that they were to respond by moving the joystick and that they were to 
respond as quickly as possible without sacrificing accuracy. The instruc- 
tions appear in Appendix B. 

The examinee's first task f, as to respond to a simple task, namely to 
indicate whether an arrow was pointing up or down. This was done to famil- 
iarize each examinee with the response device as well as to time their re- 
action time to a task with almost no cognitive load. From these responses 
it is possible to obtain an estimate of the motor speed of individuals. 
These data were not analyzed as part of this study, however. 

After the arrow task, subjects were given instructions on the rotation 
test. As part of the instructions they were able to manipulate an 
animation sequence containing a true and a false item. This allowed the 
examinee to become familiar with true and false items at all possible 
angles. Also, seeing the rotation in real time, examinees may have been 
encouraged to use a rotation strategy to solve the items. The examinees 
were allowed to do six practice items. The practice items were followed by 
80 real items. There was a 15-second time limit for each item. After 
15 seconds a "timeout" message was given if the subject had not responded. 
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However, students could pace themselves in the sense that they controlled 
when the next item was administered. At the end of each item students were 
told whether they had responded correctly or incorrectly. 
Parameter Estimation 

With the growing interest in the psychometric modeling of response 
time (Bloxom, 1985; Scheiblechner , 1985; Thissen, 1983), it is likely that 
estimation procedures tailored to response time will be forthcoming. In 
the meantime it is possible to obtain estimates of item parameters through 
estimation procedures designed for the dichotomous case. That is the 
approach taken here. In a nutshell, the approach calls for successively 
dichotomizing response time and fitting a one-parameter logistic model at 
each dichotomization point. Imposing a one-parameter model across angular 
disparities implements the constraint that discrimination be constant 
across time. 

Each basic item, i.e., pair of distinct figures, was fitted sepa- 
rately. Figure 4 shows the structure on the data matrix for a given basic 
item. The notation "0/1," which occurs only on the upper left corner, 
indicates that the entries in that section of the matrix could be 1, a 
correct response was given, or 0, a correct response was not given. 



Insert Figure 4 About Here 



The notation "0/1/2" indicates that responses in that block could be 0, a 
correct response was not given; 1, a correct response was given, or 2, 
the "item" was not presented. A response within this type of block was 
coded 2 if in a preceding block a correct response was not given to that 
item. For example, if at the end of the three-second interval an examinee 
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responds "false" to a true item, then that item is coded 2 in subsequent 
intervals. The notation 2 simply means that all the items in that block 
were treated as not presented. For each basic figure there were 50 
"items" corresponding to the true angular disparities and true/false 
classification, and 5 time intervals. 

Each of the eight data matrices were analyzed separately with BILOG 
(Mislevy & Bock, 1982) specifying a one-parameter logistic model. The 
resulting estimates were rescaled with respect to the distribution of 
ability estimates estimated with the EAP algorithm. 
Results 

Unlike the typical mental rotation experi nent in which subjects 
receive a great deal of practice time, the subjects in this study spent 
altogether no more than forty minutes, including instruction and practice 
items, on the mental rotation task. Therefore, it is important to verify 
that the usual finding concerning the linearity of response time on angular 
disparity is replicated in this case. Figure 5 shows the relationship 
between angular disparity and response time for correct responses for true 
and false items across the eight basic figures. Figure 5 suggests that 
there is, for the most part, a good linear fit to the data. The largest 
residual is at 100 degrees. Apart from this, there is relatively little 
scatter around the best-fitted line. In fact, the fit appears better than 
has been reported in some studies. The impact of angular disparity on 
response time is less potent for false items, as Figure 5 shows. 



Insert Figure 5 About Here 



To assess the fit of Equation [2] we will examine two criteria. One 
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is the relationship of difficulty to angular disparity; the other is the 
relationship of reaction time and angular disparity for subjects of differ- 
ent ability levels. In the first instance, on the basis of Equation [2] we 
expect the relationship to be linear with angular disparity and to maintain 
the same slope as time goes by. Secondly, if Equation 2 is a valid model 
for these data we expect to find that for single subjects, or groups of 
subjects, the slope of the relationship of response time on angular dis- 
parity is constant across these subjects or groups. 

Figure 6 shows the results from the item calibration for each basic 
item. Figure 7 shows the equivalent plots for the false items. The first 
expectation is largely fulfilled for the true items. That is, there is a 
nearly perfect linear relationship between response time and angular 
disparity. Moreover, the slope of the best-fitting line does not change 
with time. The major deviation from expectation occurs at 100 degrees. 
This angular disparity proved to be consistently more difficult than 
expected. In addition, for some items, the relationship between difficulty 
and angular disparity appears to be nonlinear beyond 5 seconds. For the 
false items, however, it is clear that angular disparity is not a deter- 
minant of response time in these data (but may be so with highly practiced 
subjects) . 



Figures 8 and 9 show the mean difficulty estimates across the eight 
basic figures for true and false items. In Figure 8 we notice again the 
discrepancy at 100 degrees and a slight deviation from linearity beyond 
5 seconds. 



Insert Figures 6 and 7 About Here 
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Insert Figures 8 and 9 About Here 



The second prediction we wish to test is that the individual differ- 
ences are reflected on the intercept and not the slope. To invalidate this 
prediction it is sufficient to show that it is not so for individual exam- 
inees or groups of examinees that differ in spatial ability. Since we did 
not have available an independent measure of spatial ability tne approach 
taken here is to compare two groups widely different on SAT-M. Two group ; 
were formed based on the upper- and lower- third scores on SAT-M* The 
higher group consisted of 38 examinees with a mean SAT-M of 713? the lower 
group consisted of 31 students with a mean of 444, Figure 10 shows the 
relationship between proportion correct and mean reaction time (i.e., 
regardless of whether the response was correct or incorrect) with angular 
disparity for these two groups The higher SAT group has a higher slope, 
suggesting that they are not rotating as fast, but also a smaller inter- 
cept. Thus, other things being equal, the higher SAT examinees are faster 
on the items with smaller angular disparity and about as fast on the higher 
angular disparities. It is also true, as can be seen in Figure 10, that 
the accuracy rate is different, being higher for the high SAT-M group on 
the more discrepant items. 



'Since the model does not specifically distinguish between correct and 
incorrect responses and focuses on the modeling of response time, it is 
more appropriate to plot response time rather than response time of 
correct responses in the context of model fitting. Nevertheless, the plot 
with mean correct response time was also done, but similar results were 
obtained. 
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Insert Figure 10 About Here 



Discussion 

The motivation behind this work has been both substantive and method- 
ological. The substantive interest has been the assessment of the conver- 
gence or lack thereof of two approaches to individual differences: a 
psychometric approach and an approach inspired by cognitive psychology. 
Hunt and MacLeod (1976) have expressed concerns that the differences 
between these two approaches may be irreconcilable. They give as an 
example the tendency for psychometricians to focus on global scores , such 
as proportion correct, versus the tendency of the cognitive psychologist to 
focus on more reductionist parameters such as slope and intercepts of 
performance on task attributes. This paper shows that the dichotomy need 
not exist, at least not when we adopt IRT as a psychometric framework. For 
example, person characteristic curve methodology (Carroll, Meade, & 
Johnson, in preparation; Trabin & Weiss, 1983) is well suited to charac- 
terize subject performance in a psychologically meaningful fashion. This 
paper demonstrates that the IRT framework can be expanded when the response 
of interest is the response time and in doing so demonstrates the possi- 
bility of encompassing both global and reductionist views of individual 
differences within the same measurement framework. 

According to Equation [2] the locus of individual differences is on 
the intercept and not on the slope. The results presented in Figure 10 
show that the relationship of reaction time to angular disparity has a 
different slope and intercept for groups of presumably different spatial 
ability. If we believe that the intercept is the correct indicator of 
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ability to rotate, then we would conclude that the higher SAT group is 
higher on that ability. That conclusion would be consistent with existing 
literature (e.g., Fennema & Sherman, 1977). If we believe that the slope 
is the correct indicator, we would conclude that the high SAT group is of 
lower ability, a finding which would not be consistent with the literature. 
A reconciliation of these opposing conclusions lies in an accounting of the 
differences between the high and low SAT-M group on their accuracy. That 
is, the slope of the relationship between response time and angular dispar- 
ity cannot be compared unless accuracy is constant in the two groups, (cf . 
Kail, 1985). 

According to Equation [2], the slope of the relationship between the 
magnitude of the response, z, and difficulty can be controlled through 
the accuracy parameter. (See Appendix A). For a given ability level a 
smaller slope can be obtained by reducing accuracy as difficulty increases. 
A larger slope will be obtained, for example, by holding accuracy constant 
as difficulty increases. Therefore, the differences in slope seen in 
Figure 10 do not necessarily violate the prediction of the model and could 
well be eliminated after adjusting for accuracy. Moreover, the adjustment 
would not have to be done explicitly in practice sii je, in the estimation 
of ability, accuracy and speed are taken into account. That is, IRT may 
provide a solution to the speed-accuracy tradeoff problem (see Thissen, 
1983). The estimation of ability will be treated in a subsequent report. 

The second motivation for this work has been purely psychometLic, and 
the work focuses on the development of a generative approach to psycho- 
metric modeling and test administration (Bejar & Yocom, 1986). By a 
generative approach I mean a methodology where the generation of the items 
is controlled by an algorithm encoding sufficient knowledge about the 
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mental processes underlying performance on the item that it is capable of 
anticipating the psychometric characteristics of the item before it is 
administered to examinees. The goals of this methodology are a natural 
extension of adaptive testing (Weiss, 1983). In an adaptive test a com- 
puter retrieves from an existing database (containing previously calibrated 
items) the item that is most informative for a given individual. However, 
with a generative approach, an item, instead of being retrieved, is created 
specifically for an individual in such a way that the anticipated psycho- 
metric characteristics of the items are maximally informative for each 
individual. 

One of the goals of adaptive testing has been the achievement of high 
measurement precision throughout the ability range. This is a special 
concern with dichotomous items where a balance must be struck betweer. 
overall level of precision and distribution of precision at different 
levels of ability. As we move from dichotomous to continuous responses 
that balance takes care °f itself in the sense that for continuous response 
models information may be high throughout the ability range and in some 
cases equally high at all ability levels (Samejima, 1973). Therefore, that 
goal of adaptive testing seems to be auta.tatically satisfied through the 
use of continuous responses. Nevertheless, there may be advantages to 
adapting the angular disparity to individuals of different abilities as a 
means of insuring a common response strategy by all examinees. 

The two essential ingredients in implementing a generative approach 
are (a) a psychometric framev^rk of sufficient flexibility; and (b) a 
knowledge base about performance on the item. The three-dimensional cube 
item was chosen for this study because of the potent effect of angular 
disparity on performance in mental rotation items. It is therefore 

us 

o 
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possible to establish the relationship between angular disparity and 
difficulty based on a few points. By manipulating this feature of the item 
it becomes posbible to generate items of any arbitrary difficulty. In a 
practical implementation of this idea we may have, say, 30 basic items 
that can be presented at rotations ranging from 20 to 180. Although 
everyone would be presented all 30 items, the rotation at which they are 
actually presented would be different for different individuals. 

The measurement model that was explored in this paper to explain 
performance on a three-dimensional rotation item is an extension of 
existing item response models in current use for the dichotomous response 
case (Samejima, 1983). Two predictions of this model were explored as a 
means of testing its fit. One prediction concerned the iinearity between 
difficulty and angular disparity. The results suggested that the predic- 
tion was substantially satisfied. The exception was at 100 degrees which 
was found to be a more difficult angular disparity than expected. Also, 
beyond 5 seconds a nonlinear relationship appears to emerge. 

Implicit in this prediction is the assumption that both true and false 
versions of an item would have the same slope. This was distinctly not the 
case. That is, it appears that a two-dimensional model would be required 
to provide an adequate description of the true and false date. (To assess 
the bias that may have been introduced into the item parameter estimates as 
a result of the bi-dimensionality, the model was fitted to true items only, 
but no noticeable difference could be seen in the resulting estimates.) 

The second prediction was concerned with the interpretation of slope 
and intercept parameters. According to the model, individuals of different 
ability have the same slope but different intercepts. The results sug- 
gested that considering the different accuracy rates of a high- and 
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low-scoring SAT-M group, the difference in slopes could not necessarily be 
interpreted as valid indicators of rotation speed. In effect, our results 
suggest that the bulk of individual differences is on the intercept rather 
than the slope parameter. This does not necessarily conflict with research 
suggesting that rotation speed is an important individual-differences 
variable since examinees were exposed to he items for a relatively short 
period of time. Indeed, in contradiction of other research, the males and 
females studied do not differ on the slopes and intercepts (Be jar & Harvey, 
in progress). Whereas sex differences in tasks involving mental rotation 
appear je a well established fact (Linn & Petersen, 1985), and because 
of the relatively brief exposure that subjects had, our data may not be 
typical. 

Wnile results of the study show that the idea of generating items of 
arbitrary difficulty is indeed feasible, there are some difficulties that 
must be borne in mind with even relatively simple stimuli, such as the 
three-dimensional rotation items. Specifically, performance on false items 
is not a function of angular disparity; on the surface this finding there- 
fore suggests that performance on the false items is controlled by a 
different combination of mental processes (see Carter, Pazak, & Kail, 
1983). In short, further psychometric examination of the three-dimensional 
cubes could focus on multidimensional modeling of processing and decision 
processes and on models that characterize the change in performance as a 
function of practice. 
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Figure 1 

Sample True and False Three-Dimensional Rotation Items 
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Figure 2 

Hypothetical Relationship Between Difficulty 
and Angular Disparity and Function of Time 
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Figure 3 

Hypothetical Item Response Functions as a Function of Time 
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Figure 4 



Data Matrix for True and False Versions of an Item at Successive Dichototnization Points and Angular Disparities 



Subjects 1 



3 Seconds 

20° 60° 100° 140° 160° 
TF TF TF TF TF 


4 Seconds 

20° 60° 100° 140° 180° 
TF TF TF TF TF 


5 Seconds 

20° 60° 100° 140° 180° 
TF TF TF TF TF 


0/1 


2 


2 


2 


0/1/2 


2 


2 


2 


0/1/? 


2 


2 


2 


2 


2 


2 



Subjects 2 



3 


0/1 


2 


2 


4 


2 


0/1/2 


2 


5 


2 


0/1/2 


0/1/2 


6 


2 


2 


2 


7 


2 


2 


2 
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Figure 5 

Relationship Between Reaction Time and Angular Disparity for 
True and False Versions Averaged Across the Eight Items 



-29- 




Figure 6 

Relationship Between Difficulty and Angular Disparity for Each 
of the Eight Basic Items as a Function of Time (True Version) 
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Relationship Between Difficulty and Angular Disparity for Each 
of the Eight Basic Items as a Function of Time (False Version) 
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Figure 8 



Relationship Between Difficulty and Angular Disparity as a 
Function of Time Averaged Across All Items (True Version) 
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Figure 9 

Relationship Between Difficulty and Angular Disparity as a 
Function of Time Averaged Across Eight Items (False Version) 
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Figure 10 _34_ 

Relationship Between Proportion Correct and Reaction Time as a 
Function of Angular Disparity for High and Low Scoring SAT-M Groups 
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A-1 



To explore the implications of the model it is a matter of substi- 
tuting values for the parameters. In this appendix we will show the 
relationship between reaction time and difficulty and show how the slope of 
the relationship between the response and difficulty can be manipulated by 
changing accuracy. 

The response model is 

p = 

2 -Da.(6 - b z ) [Al] 

1 + e i 

it 

We will interpret P z as the accuracy and z as the response time. We 

wish to observe the response time as a function of accuracy. (Actually, 
since the model is oriented such that a large response is associated with 
higher 6 we will study the relationship with i-z instead. ) For 
illustration purposes we will assume there are four items of the same 
discrimination but different difficulty and will examine the results of the 
model for ability level 8=1. More concretely, assume: 
a. =1.00 for k=1..4 

b z = 3z + <{> k (k=1..4) 
k 

<f> k = .25k - .25 (k=1..4) 
G = 1.0 

To simulate different response modes we will vary P z and observe 
what happens to (1-z). The; results for three "response modes" appear in 
Table Al. Column A has the .esults for a careless mode. That is the 
response (1-z) remains constant as difficulty increases but accuracy 
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decreases from .60 to .30. For mode C, however, (1-z) increases from 
.7b to .99, and accuracy remains constant. That is, in order to main- 
tain a c< .istant accuracy rate on increasingly difficult items, 1-z must 
increase. 

Table Al 





A 




B 




C 




k_ 


(1-2) 


* 

P 


(1-2) 


* 

P 


(1-z) 


* 

P 


1 


.75 


.60 


.75 


.60 


.75 


.60 


2 


.75 


.50 


.79 


.55 


.83 


.60 


3 


.75 


.40 


• S3 


.50 


.91 


.60 


4 


.75 


.30 


.88 


.45 


.99 


.60 



Figure Al plots those data. As can be ceen the ^lope is zero for A 

and highest for C. In ^hort, the slope cf reaction time on item attri- 
butes is partly ? \on of tha accuracy ;;xth which the individual 
chooses to respona. 



Insert Figure Al About, Here 
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Figure Al 

Relationship Between Response Time and Difficulty 
Level for Three Levels of Accuracy 
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APPENDIX B 
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Instruction for Mental Rotation Items 

In the following exercises you must respond with 
the joystick and the red button. To get you used 
to responding in this • ay we want you to practice 
on some samples. In these samples you will see an 
arrow pointing up or down: 

PUSH the stick FORWAKD if arrow points UP. 
PULL the stick BACK if arrow points down. 
RESPOND AS QUICKLY AS YOU CAN. 

(Forty arrow items are presented.) 



The first kind of exercise you will work on 
consists of two figures. Your task is to decide 
whether or not the two figures are the same. It is 
important to be FAST and CORRECT. Sometimes the 
two figures will be the same even though the one 
on the right may be drawn at a different angle. 
Sometimes the two figures will not be the same. 



To show you what we mean, in the following example 
the two figures are the sane but are at a different 
angle. To see the exaiiiplo press the red button. 
Each time you press the :ed button the figure on 
the right will move closer to the one on the left. 

(The subject is able to rotate the figure back and forth,) 

If you would like to see this example again push 
the joystick forward. To see the rest of the 
instructions press the red button. 

You just saw an example of figures that are the 
same. In the following example the two figures are 
NOT the same. To see the example press the red 
button. Each time you press the red button the 
figure on tho right will come closer to the one on 
the left but they will still not be the same. 

If you would like to see this example again push 
the joystick forward. To see the rest of the 
instructions press the red button. 
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First you will take some practice trials, it is 
important to be FAST and CORRECT. However, you can 
pace yourself because with the red button you 
control when to see the next trial. The time you 
take between trials is not counted. 



Respond QUICKLY and CORRECTLY. 
PUSH joystick FORWARD 

if figures are the SAME. 
PULL joystick BACKWARD 

if the figures are NOT THE SAME. 
Press the red button when you aire ready 
for the next trial. 



(Six practice items are presented.) 
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You are now ready for the real trials. Remember 
Respond QUICKLY and CORRECTLY. 
PUSH joystick FORWARD 

if figures are the SAME. 
PULL joystick BACKWARD 

if the figures are NOT THE SAME. 
Press the red button when you are ready for the 
next trial. 



(The 80 items are next) 
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